String Pattern Discovery

نویسنده

  • Ayumi Shinohara
چکیده

Finding a good pattern which discriminates one set of strings from the other set is a critical task in knowledge discovery. In this paper, we review a series of our works concerning with the string pattern discovery. It includes theoretical analyses of learnabilities of some pattern classes, as well as development of practical data structures which support efficient string processing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A string pattern regression algorithm and its application to pattern discovery in long introns.

We present a new approach to pattern discovery called string pattern regression, where we are given a data set that consists of a string attribute and an objective numerical attribute. The problem is to find the best string pattern that divides the data set in such a way that the distribution of the numerical attribute values of the set for which the pattern matches the string attribute, is mos...

متن کامل

A Template Discovery Algorithm by Substring Amplification

In this paper, we consider to find a set of substrings common to given strings. We define this problem as the template discovery problem which is, given a set of strings generated by some fixed but unknown pattern, to find the constant parts of the pattern. A pattern is a string over constant and variable symbols. It generates strings by replacing variables into constant strings. We assume that...

متن کامل

Online Grammar Compression for Frequent Pattern Discovery

Various grammar compression algorithms have been proposed in the last decade. A grammar compression is a restricted CFG deriving the string deterministically. An efficient grammar compression develops a smaller CFG by finding duplicated patterns and removing them. This process is just a frequent pattern discovery by grammatical inference. While we can get any frequent pattern in linear time usi...

متن کامل

String Kernels Based on Variable-Length-Don't-Care Patterns

We propose a new string kernel based on variable-lengthdon’t-care patterns (VLDC patterns). A VLDC pattern is an element of (Σ∪{⋆})∗, where Σ is an alphabet and ⋆ is the variable-length-don’t-care symbol that matches any string in Σ∗. The number of VLDC patterns matching a given string s of length n is O(2). We present an O(n) algorithm for computing the kernel value. We also propose variations...

متن کامل

Text Data Mining with Optimized Pattern Discovery

This paper describes an application of the optimized pattern discovery framework to text and Web mining. In particular, we introduce a class of simple combinatorial patterns over phrases, called proximity phrase association patterns, and consider the problem of nding the patterns that optimizes a given statistical measure in a large collection of unstructured texts. For this class of patterns, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004